Search CORE

21 research outputs found

Multi-Source Neural Variational Inference

Author: Günnemann Stephan
Kurle Richard
van der Smagt Patrick
Publication venue
Publication date: 17/11/2018
Field of study

Learning from multiple sources of information is an important problem in machine-learning research. The key challenges are learning representations and formulating inference methods that take into account the complementarity and redundancy of various information sources. In this paper we formulate a variational autoencoder based multi-source learning framework in which each encoder is conditioned on a different information source. This allows us to relate the sources via the shared latent variables by computing divergence measures between individual source's posterior approximations. We explore a variety of options to learn these encoders and to integrate the beliefs they compute into a consistent posterior approximation. We visualise learned beliefs on a toy dataset and evaluate our methods for learning shared representations and structured output prediction, showing trade-offs of learning separate encoders for each information source. Furthermore, we demonstrate how conflict detection and redundancy can increase robustness of inference in a multi-source setting.Comment: AAAI 2019, Association for the Advancement of Artificial Intelligence (AAAI) 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance

Author: Batliner Anton
El-Desoky Mousa Amr
Hantke Simone
Kurle Richard
Ringeval Fabien
Schuller Björn
Weninger Felix
Publication venue
Publication date: 01/01/2016
Field of study

We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient

Directory of Open Access Journals

PubMed Central

Spiral - Imperial College Digital Repository

I hear you eat and speak: automatic recognition of Eating Condition and food type, use-cases, and impact on ASR performance

Author: Batliner Anton
Hantke Simone
Kurle Richard
Mousa Amr El-Desoky
Ringeval Fabien
Schuller Björn
Weninger Felix
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/01/2020
Field of study

OPUS Augsburg

Deep Explicit Duration Switching Models for Time Series

Author: Ali Caner Turkmen
Ansari Abdul Fatir
Benidis Konstantinos
Januschowski Tim
Kurle Richard
Smola Alexander
Soh Soon Hong Harold
Wang Bernie
Publication venue
Publication date: 18/01/2022
Field of study

Neural Information Processing Systems (NeurIPS)3

ScholarBank@NUS

Degree of high-frequency noise of the words ‘warmed up’ caused by eating.

Author: Amr El-Desoky Mousa (3170589)
Anton Batliner (3170580)
Björn Schuller (504303)
Fabien Ringeval (3170574)
Felix Weninger (504301)
Richard Kurle (3170583)
Simone Hantke (3170586)
Publication venue
Publication date
Field of study

<p>Subjects (left: female, right: male) while recording an utterance eating a banana (top), without eating a sort of food (middle), and eating crisps (bottom).</p

FigShare